591 TFLOPS Multi-trillion Particles Simulation on SuperMUC

نویسندگان

  • Wolfgang Eckhardt
  • Alexander Heinecke
  • Reinhold Bader
  • Matthias Brehm
  • Nicolay Hammer
  • Herbert Huber
  • Hans-Georg Kleinhenz
  • Jadran Vrabec
  • Hans Hasse
  • Martin Horsch
  • Martin Bernreuther
  • Colin W. Glass
  • Christoph Niethammer
  • Arndt Bode
  • Hans-Joachim Bungartz
چکیده

Anticipating large-scale molecular dynamics simulations (MD) in nano-fluidics, we conduct performance and scalability studies of an optimized version of the code ls1 mardyn. We present our implementation requiring only 32 Bytes per molecule, which allows us to run the, to our knowledge, largest MD simulation to date. Our optimizations tailored to the Intel Sandy Bridge processor are explained, including vectorization as well as shared-memory parallelization to make use of Hyperthreading. Finally we present results for weak and strong scaling experiments on up to 146016 Cores of SuperMUC at the Leibniz Supercomputing Centre, achieving a speed-up of 133k times which corresponds to an absolute performance of 591.2 TFLOPS.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Performance of the Intel TFLOPS Supercomputer

The purpose of building a supercomputer is to provide superior performance on real applications. In this paper, we describe the performance of the Intel TFLOPS Supercomputer starting at the lowest level with a detailed investigation of the Pentium® Pro processor and the supporting memory subsystem. We follow this with a description of the benchmarks used to track the performance of the machine ...

متن کامل

An Overview of the Intel TFLOPS Supercomputer

Computer simulations needed by the U.S. Department of Energy (DOE) greatly exceed the capacity of the world’s most powerful supercomputers. To satisfy this need, the DOE created the Accelerated Strategic Computing Initiative (ASCI). This program accelerates the development of new scalable supercomputers and will lead to a supercomputer early in the next century that can run at a rate of 100 tri...

متن کامل

I/O for TFLOPS Supercomputers

Scalable parallel computers with TFLOPS (Trillion FLoating Point Operations Per Second) performance levels are now under construction. While we believe TFLOPS processor technology is sound, we believe the software and I/O systems surrounding them need improvement. This paper describes our view of a proper system that we built for the nCUBE parallel computer and which is now commercially availab...

متن کامل

High-Performance Small-Scale Simulation of Star Clusters Evolution on Cray XD1

In this paper, we describe the performance of an N -body simulation of star cluster with 64k stars on a Cray XD1 system with 400 dual-core Opteron processors. A number of astrophysical N -body simulations were reported in SCxy conferences. All previous entries for Gordon-Bell prizes used at least 700k particles. The reason for this preference of large numbers of particles is the parallel effici...

متن کامل

Scaling of the GROMACS 4.6 molecular dynamics code on SuperMUC

Here we report on the performance of GROMACS 4.6 on the SuperMUC cluster at the Leibniz Rechenzentrum in Garching. We carried out benchmarks with three biomolecular systems consisting of eighty thousand to twelve million atoms in a strong scaling test each. The twelve million atom simulation system reached a performance of 49 nanoseconds per day on 32,768 cores.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013